feat: add VLMRenderJudge metric #332

e06084 · 2026-01-12T09:58:17Z

No description provided.

gemini-code-assist · 2026-01-12T09:58:38Z

Summary of Changes

Hello @e06084, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the multimodal evaluation capabilities by introducing VLMRenderJudge, a metric designed for visually assessing OCR quality. Unlike traditional text-based comparisons, this metric renders OCR results into images and leverages VLMs to judge their consistency against original images. This approach is particularly beneficial for evaluating complex content like mathematical formulas and structured tables, offering a more human-like assessment of OCR accuracy. The accompanying RenderTool provides the necessary functionality to convert various content types into visual representations for this evaluation.

Highlights

New VLMRenderJudge Metric: Introduced a novel visual OCR quality evaluation metric that uses a Vision-Language Model (VLM) to compare rendered OCR output with original images.
RenderTool for Content Visualization: Added a RenderTool capable of converting text, LaTeX equations, and HTML tables into images, which is crucial for the VLMRenderJudge workflow.
Comprehensive Documentation: Provided detailed English and Chinese guides for VLMRenderJudge, covering its principles, usage, and integration.
Expanded Test Coverage: Included new test data and unit tests for both VLMRenderJudge and RenderTool to ensure reliability and correctness.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new VLMRenderJudge metric for visually evaluating OCR quality. This is a significant feature that uses a "Render -> Judge" pattern, where OCR output is rendered as an image and compared to the original using a VLM. The implementation includes a new RenderTool for handling text and LaTeX rendering, comprehensive documentation in both English and Chinese, and thorough unit tests. The code is well-structured, but there are a few areas for improvement, particularly in the RenderTool regarding security best practices, portability of LaTeX rendering, and robustness in handling special characters. The documentation also has a minor error in the installation command.

dingo/model/llm/agent/tools/render_tool.py

gemini-code-assist · 2026-01-12T10:02:02Z

dingo/model/llm/agent/tools/render_tool.py

+        # Escape special characters in text mode
+        # (simplified - full implementation would be more complex)
+        return content


The _preprocess_latex method currently doesn't escape special LaTeX characters (e.g., _, ^, &, %, $, #, {, }). If the input content is not already a valid LaTeX string and contains these characters, it can cause the xelatex compilation to fail. While the comment acknowledges this is a simplified implementation, it's a potential source of bugs for real-world OCR data. Consider adding basic escaping for common special characters.

gemini-code-assist · 2026-01-12T10:02:02Z

docs/en/vlm_render_judge_guide.md

+
+```bash
+# Basic dependencies
+pip install dingo pillow


The installation command pip install dingo pillow is incorrect. The package name on PyPI is dingo-python. This will cause installation to fail for users following the guide.

Suggested change

pip install dingo pillow

pip install dingo-python pillow

gemini-code-assist · 2026-01-12T10:02:02Z

docs/vlm_render_judge_guide.md

+
+```bash
+# 基础依赖
+pip install dingo pillow


The installation command pip install dingo pillow is incorrect. The package name on PyPI is dingo-python. This will cause installation to fail for users following the guide.

Suggested change

pip install dingo pillow

pip install dingo-python pillow

e06084 added 2 commits January 12, 2026 17:57

feat: add VLMRenderJudge metric

7a71b10

x

aaa52aa

gemini-code-assist bot reviewed Jan 12, 2026

View reviewed changes

fix comment

447fc23

e06084 force-pushed the dev branch from db77b94 to 447fc23 Compare January 12, 2026 10:13

e06084 and others added 2 commits January 12, 2026 18:38

add mineru orc tool

858dc12

📚 Auto-update metrics documentation

ba6ef56

e06084 merged commit a63df8e into MigoXLab:dev Jan 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add VLMRenderJudge metric #332

feat: add VLMRenderJudge metric #332

Uh oh!

e06084 commented Jan 12, 2026

Uh oh!

gemini-code-assist bot commented Jan 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add VLMRenderJudge metric #332

feat: add VLMRenderJudge metric #332

Uh oh!

Conversation

e06084 commented Jan 12, 2026

Uh oh!

gemini-code-assist bot commented Jan 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants